Title of Thesis: Learning Structured Classifiers for Statistical Dependency Parsing Learning Structured Classifiers for Statistical Dependency Parsing
نویسندگان
چکیده
In this thesis, I present three supervised and one semi-supervised machine learning approach for improving statistical natural language dependency parsing. I first introduce a generative approach that uses a strictly lexicalised parsing model where all the parameters are based on words, without using any part-of-speech (POS) tags or grammatical categories. Then I present an improved large margin approach for learning dependency parsers from treebank data that allows a more general set of linguistic features to be used. Specifically, I incorporate local constraints that enforce the correctness of each individual link, rather than just scoring the whole parse tree. For dealing with sparse data, I smooth the lexical parameters according to their underlying word similarities using Laplacian regularization. Third, I present a simpler and more efficient approach to training dependency parsers by applying a boosting-like procedure to standard supervised training methods. By using logistic regression as an efficient base classifier (for predicting dependency links between word pairs), I am able to efficiently train a dependency parsing model, via structured boosting, that achieves state-of-the-art results in English, and surpasses state-of-the-art in Chinese. Finally, I propose a novel semi-supervised training algorithm for learning dependency parsers. By combining a supervised large margin loss with an unsupervised least squares loss, I obtain a discriminative, convex, semi-supervised training algorithm for dependency parsing.
منابع مشابه
Learning Structured Classifiers for Statistical Dependency Parsing
My research is focused on developing machine learning algorithms for inferring dependency parsers from language data. By investigating several approaches I have developed a unifying perspective that allows me to share advances between both probabilistic and non-probabilistic methods. First, I describe a generative technique that uses a strictly lexicalised parsing model, where all the parameter...
متن کاملSecond Exam: Natural Language Parsing with Neural Networks
With the advent of “deep learning”, there has been a recent resurgence of interest in the use of artificial neural networks for machine learning. This paper presents an overview of recent research in the statistical parsing of natural language sentences using such neural networks as a learning model. Though it is a fairly new addition to the toolset in this area, important results have been rec...
متن کاملAdvances in discriminative dependency parsing
Achieving a greater understanding of natural language syntax and parsing is a critical step in producing useful natural language processing systems. In this thesis, we focus on the formalism of dependency grammar as it allows one to model important headmodifier relationships with a minimum of extraneous structure. Recent research in dependency parsing has highlighted the discriminative structur...
متن کاملLogistic Online Learning Methods and Their Application to Incremental Dependency Parsing
We investigate a family of update methods for online machine learning algorithms for cost-sensitive multiclass and structured classification problems. The update rules are based on multinomial logistic models. The most interesting question for such an approach is how to integrate the cost function into the learning paradigm. We propose a number of solutions to this problem. To demonstrate the a...
متن کاملIRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Confidence Estimation in Structured Prediction
Structured classification tasks such as sequence labeling and dependency parsing have seen much interest by the Natural Language Processing and the machine learning communities. Several online learning algorithms were adapted for structured tasks such as Perceptron, PassiveAggressive and the recently introduced Confidence-Weighted learning . These online algorithms are easy to implement, fast t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007